Search CORE

83 research outputs found

Transfer from Multiple MDPs

Author: Lazaric Alessandro
Restelli Marcello
Publication venue
Publication date: 01/01/2011
Field of study

Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them into the training set used to solve a given target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.Comment: 201

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Archivio istituzionale della ricerca - Politecnico di Milano

INRIA a CCSD electronic archive server

Estimating Maximum Expected Value through Gaussian Approximation

Author: D'ERAMO CARLO
NUARA ALESSANDRO
RESTELLI MARCELLO
Publication venue: JMLR.org
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

A Novel Confidence-Based Algorithm for Structured Bandits

Author: Lazaric Alessandro
Restelli Marcello
Tirinzoni Andrea
Publication venue
Publication date: 01/01/2020
Field of study

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem and rapidly discard all sub-optimal arms. In particular, unlike standard bandit algorithms with no structure, we show that the number of times a suboptimal arm is selected may actually be reduced thanks to the information collected by pulling other arms. Furthermore, we show that, in some structures, the regret of an anytime extension of our algorithm is uniformly bounded over time. For these constant-regret structures, we also derive a matching lower bound. Finally, we demonstrate numerically that our approach better exploits certain structures than existing methods.Comment: AISTATS 202

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Estimating the maximum expected value in continuous reinforcement learning problems

Author: D'Eramo Carlo
Nuara Alessandro
Pirotta Matteo
Restelli Marcello
Publication venue: AAAI press
Publication date: 01/01/2017
Field of study

This paper is about the estimation of the maximum expected value of an infinite set of random variables. This estimation problem is relevant in many fields, like the Reinforcement Learning (RL) one. In RL it is well known that, in some stochastic environments, a bias in the estimation error can increase step-by-step the approximation error leading to large overestimates of the true action values. Recently, some approaches have been proposed to reduce such bias in order to get better action-value estimates, but are limited to finite problems. In this paper, we leverage on the recently proposed weighted estimator and on Gaussian process regression to derive a new method that is able to natively handle infinitely many random variables. We show how these techniques can be used to face both continuous state and continuous actions RL problems. To evaluate the effectiveness of the proposed approach we perform empirical comparisons with related approaches

Archivio istituzionale della ricerca - Politecnico di Milano

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning in Non-Cooperative Configurable Markov Decision Processes

Author: Concetti Alessandro
Metelli Alberto Maria
Ramponi Giorgia
Restelli Marcello
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2021
Field of study

The Configurable Markov Decision Process framework includes two entities: a Reinforcement Learning agent and a configurator that can modify some environmental parameters to improve the agent's performance. This presupposes that the two actors have the same reward functions. What if the configurator does not have the same intentions as the agent? This paper introduces the Non-Cooperative Configurable Markov Decision Process, a setting that allows having two (possibly different) reward functions for the configurator and the agent. Then, we consider an online learning problem, where the configurator has to find the best among a finite set of possible configurations. We propose two learning algorithms to minimize the configurator's expected regret, which exploits the problem's structure, depending on the agent's feedback. While a naive application of the UCB algorithm yields a regret that grows indefinitely over time, we show that our approach suffers only bounded regret. Furthermore, we empirically show the performance of our algorithm in simulated domains

Archivio istituzionale della ricerca - Politecnico di Milano

Best Arm Identification for Stochastic Rising Bandits

Author: Metelli Alberto Maria
Montenegro Alessandro
Mussi Marco
Restelli Marcello
Trovó Francesco
Publication venue
Publication date: 15/02/2023
Field of study

Stochastic Rising Bandits is a setting in which the values of the expected rewards of the available options increase every time they are selected. This framework models a wide range of scenarios in which the available options are learning entities whose performance improves over time. In this paper, we focus on the Best Arm Identification (BAI) problem for the stochastic rested rising bandits. In this scenario, we are asked, given a fixed budget of rounds, to provide a recommendation about the best option at the end of the selection process. We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE, which resorts to a UCB-like approach, and R-SR, which employs a successive reject procedure. We show that they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process. Finally, we numerically validate the proposed algorithms in synthetic and realistic environments and compare them with the currently available BAI strategies

arXiv.org e-Print Archive

A Nanocryotron Ripple Counter Integrated with a Superconducting Nanowire Single-Photon Detector for Megapixel Arrays

Author: Berggren Karl K.
Bienfang Joshua C.
Buzzi Alessandro
Castellani Matteo
Colangelo Marco
Foster Reed A.
Medeiros Owen
Restelli Alessandro
Publication venue
Publication date: 11/07/2023
Field of study

Decreasing the number of cables that bring heat into the cryocooler is a critical issue for all cryoelectronic devices. Especially, arrays of superconducting nanowire single-photon detectors (SNSPDs) could require more than

10^6

readout lines. Performing signal processing operations at low temperatures could be a solution. Nanocryotrons, superconducting nanowire three-terminal devices, are good candidates for integrating sensing and electronics on the same technological platform as SNSPDs in photon-counting applications. In this work, we demonstrated that it is possible to read out, process, encode, and store the output of SNSPDs using exclusively superconducting nanowires. In particular, we present the design and development of a nanocryotron ripple counter that detects input voltage spikes and converts the number of pulses to an

N

-digit value. The counting base can be tuned from 2 to higher values, enabling higher maximum counts without enlarging the circuit. As a proof-of-principle, we first experimentally demonstrated the building block of the counter, an integer-

N

frequency divider with

N

ranging from 2 to 5. Then, we demonstrated photon-counting operations at 405\,nm and 1550\,nm by coupling an SNSPD with a 2-digit nanocryotron counter partially integrated on-chip. The 2-digit counter operated in either base 2 or base 3 with a bit error rate lower than

2 \times 10^{-4}

and a maximum count rate of

45 \times 10^6\,

^{-1}

. We simulated circuit architectures for integrated readout of the counter state, and we evaluated the capabilities of reading out an SNSPD megapixel array that would collect up to

10^{12}

counts per second. The results of this work, combined with our recent publications on a nanocryotron shift register and logic gates, pave the way for the development of nanocryotron processors, from which multiple superconducting platforms may benefit

arXiv.org e-Print Archive